Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not merge?: Subclass Sync - Direct in source, indirect in Mondo - Mini build #717

Draft
wants to merge 3 commits into
base: subclass-sync-direct-source-indirect-mondo
Choose a base branch
from

Conversation

joeflack4
Copy link
Contributor

@joeflack4 joeflack4 commented Dec 3, 2024

@joeflack4 joeflack4 marked this pull request as draft December 3, 2024 02:27
@joeflack4
Copy link
Contributor Author

@twhetzel This is ready for review, but I marked it as draft per the convention to not merge it. Although maybe @matentzn might want these files merged (at least the new ones) in order to pull them into a mondo PR, not sure.

@joeflack4 joeflack4 self-assigned this Dec 3, 2024
@joeflack4 joeflack4 added the build Mostly for build PRs: when changes only to data files post `build-mondo-ingest`; no code changes label Dec 3, 2024
@joeflack4 joeflack4 force-pushed the subclass-sync-direct-source-indirect-mondo-mini-build branch from 65f2659 to a72151f Compare December 3, 2024 02:32
Copy link
Member

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One bug (probably synchronisation), else just some interesting thing that IMO should be manually spot checked by @twhetzel in the ontology.

src/ontology/reports/doid.subclass.confirmed.robot.tsv Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting fact to note all by itself. Since the only source of subclass axioms in OMIM is OMIMPS, I would have expected a much shorter list. What does this mean?

There are 830 cases where Mondo injects a class between and OMIMPS and an OMIM.

I would suggest @twhetzel verifies 2 or 3 of these because that sounds a bit odd to my ears. Maybe make an issue and have Chris or Sabrina sign of on these 2 or 3 cases.

Copy link
Contributor Author

@joeflack4 joeflack4 Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this to the experts, but just wanted to take a cursory glance. Here are some cataract terms.

I'm not sure what @matentzn means by injecting a class. at least in the one case I checked, MONDO:0007286, cataract 30 corresponds to https://omim.org/entry/116300 CATARACT 30, MULTIPLE TYPES. And MONDO:0005129, cataract corresponds to https://omim.org/entry/116200, CATARACT 1, MULTIPLE TYPES. Seems to me like the subject/objects for Mondo and OMIM are matched well here, and I don't see an (injected) intermediary class?

subject_mondo_id subject_mondo_label object_mondo_id subject_source_id object_source_id object_mondo_label
MONDO:0007286 cataract 30 MONDO:0005129 OMIM:116300 OMIMPS:116200 cataract
More cataract cases

subject_mondo_id subject_mondo_label object_mondo_id subject_source_id object_source_id object_mondo_label
MONDO:0007278 cataract 32 multiple types MONDO:0005129 OMIM:115650 OMIMPS:116200 cataract
MONDO:0007279 cataract 7 MONDO:0005129 OMIM:115660 OMIMPS:116200 cataract
MONDO:0007280 cataract 8 multiple types MONDO:0005129 OMIM:115665 OMIMPS:116200 cataract
MONDO:0007283 cataract 42 MONDO:0005129 OMIM:115900 OMIMPS:116200 cataract
MONDO:0007284 cataract 20 multiple types MONDO:0005129 OMIM:116100 OMIMPS:116200 cataract
MONDO:0007286 cataract 30 MONDO:0005129 OMIM:116300 OMIMPS:116200 cataract
MONDO:0007287 cataract 41 MONDO:0005129 OMIM:116400 OMIMPS:116200 cataract
MONDO:0007288 cataract 6 multiple types MONDO:0005129 OMIM:116600 OMIMPS:116200 cataract
MONDO:0007289 cataract 13 with adult I phenotype MONDO:0005129 OMIM:116700 OMIMPS:116200 cataract
MONDO:0007290 cataract 5 multiple types MONDO:0005129 OMIM:116800 OMIMPS:116200 cataract

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is this. Consider this normal case:

Mondo:1 -> OMIMPS:1
mondo:2 -> OMIM:2
Mondo:2 subclass Mondo:1

What this table is saying is that there must be some intermediate mondo class Mondo:x like this:

Mondo:1 -> OMIMPS:1
mondo:2 -> OMIM:2
Mondo:2 subclass Mondo:X
Mondo:X subclass Mondo:1

That's the definition of indirect. Else why would it appear in this table?

Copy link
Contributor Author

@joeflack4 joeflack4 Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh of course! Derp, I got it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, just to continue my above example, here's further detail with the intermediate class:

id: MONDO:0007289
name: cataract 13 with adult I phenotype
is_a: MONDO:0011060 {source="Orphanet:91492/btnt"} ! early-onset non-syndromic cataract

id: MONDO:0011060
name: early-onset non-syndromic cataract
is_a: MONDO:0005129 {source="Orphanet:91492", source="Orphanet:91492/inferred"} ! cataract

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs curation eyes, sorry!

Copy link
Contributor

@twhetzel twhetzel Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the cataract example that Joe posted, there is one direct path in Mondo, e.g. MONDO:0007289 'cataract 13 with adult I phenotype' subClassOf MONDO:0005129 cataract and one indirect path, e.g. MONDO:0007289 'cataract 13 with adult I phenotype' subClassOf id: MONDO:0011060 'early-onset non-syndromic cataract' subClassOf MONDO:0005129 cataract.

Since the direct path already has the subClassOf provenance of OMIM:116700, is the general expectation when these cases (a direct path and an indirect path) occur that the provenance stated in the "confirmed-direct-source-indirect-mondo.robot" file is the same as the provenance that already exists because of the direct path? How are these updates applied to Mondo, e.g. could adding information from confirmed be lost by then adding "confirmed-direct-source-indirect-mondo.robot"? Should there be a qc check so that a subClassOf source annotation does not exist for more than 1 external source?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first entry in src/ontology/reports/omim.subclass.confirmed-direct-source-indirect-mondo.robot.tsv, this does look suspicious in Mondo.
mondo-Bartter disease type 4B

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super interesting!! 4500 CASES? WOW! If this is correct this is a huge story for the Rare Disease paper! Maybe verify 2 or 3 random ones!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the achievement here? Is it just that Mondo has grown to provide super granular rare disease ontologization that a/the leading rare disease ontology, Orphanet, does not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The story is that the hierarchy already available through ORDO was enriched with mir fine grained groupings! That are not in ORdo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 4,004 cases where the "indirect" path in Mondo is from the Mondo term up to MONDO:0000001 disease. The "direct path" in the source does not mean that there are also not indirect paths in the source that provide similar information, e.g. MONDO:0000023 'infantile liver failure'.

…ss-sync-direct-source-indirect-mondo-mini-build
- Update: Re-ran again, this time using the most up-to-date inputs.
@joeflack4 joeflack4 changed the title Subclass Sync - Direct in source, indirect in Mondo - Mini build Do not merge?: Subclass Sync - Direct in source, indirect in Mondo - Mini build Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Mostly for build PRs: when changes only to data files post `build-mondo-ingest`; no code changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants